Skip to content

ci: R2R-compile staged DLLs (crossgen2) before nupkg pack#1

Merged
oysteinkrog merged 1 commit into
if/mainfrom
ci/r2r-compile-staged-dlls
May 1, 2026
Merged

ci: R2R-compile staged DLLs (crossgen2) before nupkg pack#1
oysteinkrog merged 1 commit into
if/mainfrom
ci/r2r-compile-staged-dlls

Conversation

@oysteinkrog
Copy link
Copy Markdown
Member

Summary

Adds a crossgen2 (ReadyToRun) compilation step to the nupkg build pipeline so the published `InitialForce.WPF` and `InitialForce.WPF.RuntimeOverride` nupkgs contain native code, matching stock `Microsoft.WindowsDesktop.App` behavior.

Why

Stock dotnet/wpf DLLs ship with R2R native code baked in by dotnet/runtime's runtime-pack assembly step. Our fork builds the libraries via `build.cmd` but does not run that step, so the DLLs in our nupkgs were JIT-only.

JIT'd frames are slightly fatter than R2R'd frames. WPF code paths that are already deep on the stack — notably the dispatcher unhandled-exception handler loading `MessageDialog.xaml` → BAML callbacks → WPFLocalizeExtension's 800-culture iteration — overflowed the 1 MB thread stack in consumers, taking the process down instead of merely showing the user an error.

What changed

  • `tools/crossgen-staged.ps1` — new script. Downloads `Microsoft.NETCore.App.Crossgen2.win-x64` 10.0.7 from nuget.org (cached under `.tools-cache/`), runs `crossgen2` over each of the 4 staged DLLs (`PresentationCore`, `PresentationFramework`, `WindowsBase`, `System.Xaml`), verifies the `RTR\0` magic in each output before replacing the input.
  • `.github/workflows/build.yml` — new step "R2R-compile staged DLLs (crossgen2)" inserted between staging and `dotnet pack`. Runs on both packaging trees, with `--targetarch` matching `matrix.arch` (so we get `win-arm64` R2R images for the arm64 build).
  • `.gitignore` — adds `.tools-cache/` so the downloaded crossgen2 binaries aren't tracked.

Verified locally

  • crossgen2 10.0.7 successfully R2R-compiles all 4 patched DLLs from a real build
  • Output sizes: `PresentationCore` +33.7%, `PresentationFramework` +40.1%, `WindowsBase` -0.5%, `System.Xaml` +145.7% (variation reflects native-code density)
  • All outputs contain the RTR magic at the expected offset
  • Consuming the R2R'd DLLs eliminates the deep-stack SO previously exhibited by the nupkgs

Companion fix

There is also a master-side defense-in-depth fix in InitialForce/ScDesktop#6790 that defers error-dialog construction off the deep dispatcher stack. Either fix alone resolves the SO; together they belt-and-suspenders the issue for any consumer of our WPF nupkgs.

Test plan

  • CI green on this PR
  • After merge: confirm the new `if.<run_number>` nupkg contains R2R'd DLLs
  • Smoke-test consumption from MotionCatalyst with the new nupkg

🤖 Generated with Claude Code

Stock dotnet/wpf DLLs in Microsoft.WindowsDesktop.App ship with
ReadyToRun native code, baked in by dotnet/runtime's runtime-pack
assembly step. Our fork builds the libraries via build.cmd but does
not run that step, so the DLLs we ship in the InitialForce.WPF nupkg
are JIT-only. This caused stack overflows in consumers: JIT'd frames
are slightly fatter than R2R'd frames, and WPF code paths that are
already deep on the stack (dispatcher unhandled-exception handler ->
MessageDialog.xaml -> BAML -> WPFLocalizeExtension's 800-culture
iteration) overflow the 1 MB thread stack.

Add a workflow step that downloads the upstream
Microsoft.NETCore.App.Crossgen2.win-x64 NuGet package (cached under
.tools-cache/) and runs crossgen2 over the 4 staged DLLs in both
packaging trees (InitialForce.WPF and InitialForce.WPF.RuntimeOverride).
Each output is verified to contain the R2R magic before replacing the
input. --targetarch matches the matrix.arch so we get win-arm64 R2R
images for the arm64 build.

Verified locally: crossgen2 10.0.7 successfully R2R-compiles all 4
patched DLLs (PresentationCore, PresentationFramework, WindowsBase,
System.Xaml). Output sizes grow ~0-145% (varies by symbol density),
all contain the RTR magic at the expected offset, and consuming the
R2R'd DLLs eliminates the deep-stack SO that the previous nupkgs
exhibited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oysteinkrog oysteinkrog merged commit 7079676 into if/main May 1, 2026
9 checks passed
oysteinkrog pushed a commit that referenced this pull request May 12, 2026
Each call to UIElement.InputHitTest(Point, out, out, out) allocated four
small heap objects: PointHitTestParameters, InputHitTestResult, and the
two callback delegates (filter + result). At ~60 Hz cursor movement across
a moderately deep visual tree, this fires ~5-50k times per scenario.

The 2026-05-11 deep-dive (autoresearch/deep-dive-2026-05-11/T1-point-allocations.md)
flagged this as the #1 contributor to the ~71 MB combined System.Windows.Point
allocation budget across take-open + playback — estimated savings 30-40 MB.

Three changes:
- The filter callback's body uses only the `currentNode` argument and static
  UIElementHelper helpers — no `this` capture. Make it `private static` and
  cache one shared HitTestFilterCallback delegate as a static readonly field.
- Cache a single PointHitTestParameters wrapper per thread via [ThreadStatic].
  PointHitTestParameters.SetHitPoint() (already internal) mutates the inner
  Point before each VisualTreeHelper.HitTest call.
- Add Acquire/Release pooling to the nested InputHitTestResult class. The
  HitTestResultCallback's delegate target IS the instance, so the pool stores
  the (instance, callback) pair to preserve binding across cycles. On rare
  nested reentrancy, Acquire falls back to a fresh instance — same single-slot
  pattern as the existing StreamGeometryCallbackContext pool. Result and
  HitTestResult are captured into locals BEFORE Release so the post-traversal
  iteration uses only stable values.

VisualTreeHelper.HitTest is synchronous and consumes the parameters during
traversal (no retention past return). The callbacks (filter + result) don't
reinvoke InputHitTest, so reentrancy within one traversal is impossible.
Reentrancy from the post-traversal contentHost.InputHitTest chain happens
AFTER Release — pool slot is repopulated by the time recursion would run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oysteinkrog pushed a commit that referenced this pull request May 16, 2026
Commit 7831813 ("wpf-perf(big-win T4): pool AdornerLayer._zOrderMap
value snapshot") shipped a per-instance object[] snapshot buffer
shared between MeasureOverride and ArrangeOverride to eliminate ~170 MB
of per-pass DictionaryEntry[] allocations during MotionCatalyst
take-open.

Defect: Adorner.Measure / Adorner.Arrange callouts can re-enter the
same AdornerLayer's MeasureOverride/ArrangeOverride via a nested layout
pass. A naïve shared field lets the inner call's CopyTo overwrite the
outer pass's snapshot, and its terminal Array.Clear nulls the slots
the outer is still iterating — the outer then reads a null reference
and the layout throws, leaving MotionCatalyst with a completely blank
canvas on take-open.

Fix: lease pattern. Each call captures the current field value into a
local, immediately nulls the field (so any re-entrant call allocates
its own buffer rather than aliasing), iterates on the local, and at
end of pass restores its buffer to the field — keeping whichever
buffer (own or the one a nested call left behind) is larger.

Steady state on the non-re-entrant path remains zero-allocation: the
field holds the grown buffer, every subsequent call leases-clears-
copies-iterates-clears-restores in place. Re-entrant calls pay one
object[] allocation per nesting level, matching the worst case of the
pre-7831813a baseline.

Validated end-to-end via MCP UI screenshots on MotionCatalyst:
  - HEAD before fix: take-open shows fully black canvas
  - HEAD + this fix: identical to vanilla upstream/release/10.0
    (Carl Hansen golf swing, Frame 0/1240, both video viewports
     rendered, Pressure & Stance heatmap, Launch Monitor, all data
     boxes populated, playback toggles cleanly)

All 358 perf commits in PRs #1-#4 preserved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant